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We introduce the EMC algorithm for reconstructing a particle's 3D diffraction intensity from very many 
photon shot-noise limited 2D measurements, when the particle orientation in each measurement is unknown. 
The algorithm combines a maximization step (M) of the intensity's likelihood function, with expansion (E) and 
compression (C) steps that map the 3D intensity model to a redundant tomographic representation and back 
again. After a few iterations of the EMC update rule, the reconstructed intensity is given to the difference- 
map algorithm for reconstruction of the particle contrast. We demonstrate reconstructions with simulated data 
and investigate the effects of particle complexity, number of measurements, and the number of photons per 
measurement. The relatively transparent scaling behavior of our algorithm provides a first estimate of the data 
processing resources required for future single-particle imaging experiments. 



PACS numbers: 42.30.Rx, 42.30. Wb 

I. INTRODUCTION 

If the goal of single-particle imaging by free electron x- 
ray lasers JJ] is realized in the next few years, the disciplines 
of imaging and microscopy will have partly merged with ele- 
mentary particle physics. Even with the enormous flux of the 
new light sources, the scattered radiation will be detected as 
individual photons and hardly resemble diffraction "images" 
(Figure 1). The data in these experiments will instead resem- 
ble the particle debris produced in elementary particle colli- 
sions. 

The particle physics analogy is imperfect, however. Data 
analysis in elementary particle experiments is complicated 
more as the result of complex interactions than complexity of 
the structures — consider the pions produced when a proton 
is probed with a photon. By contrast, the fundamental inter- 
actions between x-ray photons and electrons in a molecule are 
very simple and the complexity in the analysis of the data is 
entirely the result of structure. 

There are two different data analysis challenges that x-ray 
laser studies of single-particles will have to face. Consider 
the two simulated detector outputs shown in Figure 1. Are 
the photon counts different because the molecule presented a 
different orientation to the x-ray beam; is the difference at- 
tributable to the statistics of a shot-noise limited signal; or 
does some combination of the two apply? It is reasonable to 
conjecture that by collecting sufficiently many data, the ori- 
entational and statistical uncertainties can be disentangled to 
produce molecular reconstructions with acceptable noise and 
resolution. In this paper we present strong evidence in support 
of this conjecture by means of an algorithm that succeeds with 
simulated data. 

Due to the length of this paper a survey of its contents 
may be useful to the reader. Section [TT] explains the theoret- 
ical basis of our algorithm whose success is contingent upon 
an information theoretic noise criterion. Sections [TTT1 [IV] and 
[VI respectively describe the test target particles, experimen- 
tal diffraction conditions and algorithmic parameters used in 
these single-particle imaging simulations. The limits and en- 
couraging results of these simulations, whose code implemen- 
tation we elaborate in section IVI1 are presented in section 




FIG. 1: (Color online) The same or different? Two simulated mea- 
surements (noisy diffraction patterns) in a single particle imaging ex- 
periment, where color (white, red, green, blue) represents recorded 
photon counts (0, 1, 2, 3). Are the differences in the measurements 
purely statistical, or do they reflect a different view (orientation) of 
the particle? 

IVIII Finally, section [Villi discusses the scaling of our algo- 
rithm's computational requirements with reconstructed reso- 
lution. Additional relevant technical details of our paper are 
recorded in its appendices. 

II. THEORY 

The statistical noise and missing orientational information 
can be addressed by imposing internal consistency of two 
kinds. First consider the shot noise of the detected signal. 
Suppose a collection of data sets, such as the pair in Figure 
1, have been identified as candidates for data taken with the 
molecule in nearly the same orientation. While simply aver- 
aging the photon counts yields the continuous signal we are 
after, we have available the stronger test that the distribution 
of counts for the measurement ensemble, at each pixel, has 
the correct Poissonian form. If the test fails, then a different 
subset of the data must be identified which has this property. 
Statistical consistency, by itself, is thus a means for classify- 
ing like-oriented data sets. 

The purely statistical analysis makes no reference to the 
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structure implicit in the missing orientational information. 
This structure begins with the basic fact that the missing infor- 
mation comprises just three continuous variables (e.g. Euler 
angles), and extends to more detailed constraints, such as the 
fact that the data samples a signal on a spherical (Ewald) sur- 
face in three dimensions and different spherical samples have 
common values along their intersection, etc. A successful 
data analysis scheme for the single particle imaging experi- 
ments will not just have to signal-average shot noise, but must 
also reconstruct the missing orientational information by rely- 
ing on internal consistency associated with the rotation group. 
The two forms of uncertainty, statistical and orientational, are 
not independent. In particular, when the statistical noise is 
large (few detected photons), we expect the reconstruction of 
the orientational information to be probabilistic in character 
(i.e. distributions of angles as opposed to definite values). 



A. Noise criterion 

A natural question to ask is whether there exists an informa- 
tion theoretic criterion that would apply to any reconstruction 
algorithm and that can be used to evaluate the feasibility of 
reconstructions for particular experimental parameters. One 
of us B2J] studied this question for a minimal model with a 
single rotation angle and obtained an explicit noise criterion 
formula. Although such a detailed analysis is difficult to ex- 
tend to the three dimensional geometry of the single-particle 
imaging experiments, the mathematical statement of the cri- 
terion is completely general and, when evaluated numerically 
by the reconstruction algorithm, serves as a useful diagnostic. 
We include a brief discussion of the criterion here and refer 
the reader to the original article for details. 

We recall that the mutual information I(X, Y) associated 
with a pair of random variables X and Y is an information 
theoretic measure of their degree of correlation: I(X, Y) is 
the average information in bits that a measurement of X re- 
veals about Y (or conversely). Keeping with the notation of 
references ||2j, |3[], we denote the three dimensional intensity 
distribution by W, the photon counts recorded by the detec- 
tor on a two dimensional spherical surface by K, and the three 
unknown parameters that specify the orientation of the surface 
within the intensity space by 17. The intensities W have the 
interpretation of random variables (just as K and 17), since 
the particle being reconstructed (and so the associated W) 
belongs to a statistical ensemble with known characteristics 
(size, intrinsic resolution, etc.). 

There are three forms of mutual information that arise in 
the framework where information about a model W is ob- 
tained through measurement of data K that is both statisti- 
cally uncertain and incomplete (because 17 is not measured). 
The first is I(K, W) and measures the information obtained 
about the model intensities W from a typical unoriented mea- 
surement K. A second mutual information is I(K, SY)\w> me 
correlation between the measurement K and the orientation 
17 conditional on a typical model W. We may also think of 
I(K, Sl)\w as the entropy of 17 reduced by the finite entropy 
in its distribution when given typical measurements K and 



models W. Finally, a simple identity (see appendix|A| yields 
the third form 



I(K, W)\ a = I(K, W) + I(K, Sl)\ w 



(1) 



as the sum of the other two. The mutual information 
I(K, W)\n is the simplest of the three, as it measures the di- 
rect correlation between the continuous signal W and its Pois- 
son samples K because it is conditional on a known orienta- 
tion 17. In the limit where the mean photon count per detector 
pixel is much less than 1, this mutual information is given sim- 
ply in terms of the total number of photons N detected in an 
average measurement 10], 



I{K,W)\n = (l-l)N, 



(2) 



where 7 w 0.577 is Euler's constant. 

In order to sufficiently sample the particle orientations and 
improve the signal-to-noise, information is accumulated over 
the course of many measurements. The information delivered 
in a stream of measurements will initially grow in proportion 
to the number of measurements, since typically each 2D mea- 
surement K samples a different part of the 3D signal W. Two 
of the mutual information quantities introduced above may 
therefore be interpreted as information rates: 

I(K, W)\n = data rate in a hypothetical experiment 
with known particle orientations 

I(K, W) = data rate in the actual experiment 
with unknown orientations 

The time unit in these rates is the time for one measurement. 
The larger of these rates, I(K, W)\q, applies to the situation 
where the noisy data K can simply be signal-averaged to ob- 
tain W . From the ratio 



I(K, W) 
I(K, W)\a 



(3) 



we can assess the reduction in the data rate relative to the 
signal-averaging scenario. Because this reduction can be se- 
vere when shot noise is large, we are primarily interested in 
the dependence of r on the mean photon number per measure- 
ment, N. Not only does an experiment with small r(N) re- 
quire a correspondingly larger number of measurements to ob- 
tain the same signal-to-noise in the reconstructed particle, our 
reconstruction algorithm (Section III CI ) requires many more 
iterations in this case. 

Upon using equation (fl}, the case r(N) = 1/2 corresponds 
to the situation I(K, W) = I(K, ft)\w, that is, the informa- 
tion in one unoriented measurement exactly matches the infor- 
mation acquired about its orientation. This interpretation does 
not imply that reconstruction is impossible for smaller r(N), 
since the criterion refers to the properties of a single mea- 
surement while the reconstruction algorithm may, in princi- 
ple, process many measurements in aggregate. Nevertheless, 
the criterion r(N) > 1/2 correctly identifies the cross-over 
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region separating easy and hard reconstructions. Using (Q~|i we 
can rewrite the feasibility criterion in the form 



r(N) = 1 _ Mi^k >l 



(4) 



Our algorithm, based on the expectation maximization prin- 
ciple J4(], evaluates r(N) with no overhead since the proba- 
bility distributions in £1 of the measurements K, from which 
I(K, is derived, are computed in the course of updat- 

ing the model. When the inequality above is strongly violated 
we should expect a much lower signal-to-noise in the finished 
reconstruction than a naive signal-averaging estimate would 
predict. 

An important general observation about the noise criterion 
dU is that it is remarkably optimistic. As an information mea- 
sure, I(K, £l)\w grows only logarithmically with the com- 
plexity of the particle. Recall that I(K, ft)\w is the entropy 
in fl of a typical measurement K. Suppose a particle of ra- 
dius R has its density resolved to contrast elements of size 
SR. Its rotational structure (in its own space or the Fourier 
transform space of W) will then only extend to an angular res- 
olution SR/R. A sampling of the rotation group comprising 
(R/SR) 3 elements thus provides a fair estimate of the entropy 
and I(K, £l)\w evaluates to a number of order 3 log (R/SR) 
iflsill . This estimate and equations (f2]i and imply that values 
of N of only a few hundred should be sufficient to reconstruct 
even the most complex particles encountered in biology. 



B. Classification by cross correlating data 

The theoretical noise criterion discussed above is beyond 
the reach of the kind of algorithm that would seem to offer 
the most direct solution yfl. In this scenario the task is di- 
vided into two steps. The first is concerned with classifying 
the diffraction data into sets that, with some level of confi- 
dence, describe the particle in a small range of orientations. 
After averaging photon counts for the like-sets to improve the 
signal quality, the diffraction pattern averages would then be 
assembled into a consistent three dimensional intensity distri- 
bution in the second step. 

The most direct method of assessing the similarity of two 
diffraction data is to compute the cross correlation. A pair of 
like-views of the particle would be identified by a large cross 
correlation. Because this measure also fluctuates as the result 
of shot noise, its statistical significance must be estimated as 
well. The result of such an analysis JH is that the cross cor- 
relation based classification can succeed only if the average 
number of photons per diffraction pattern, N, and the number 
of detector pixels, M p ; x , satisfy 



pix- 



(5) 



This criterion imposes a higher threshold on N than the in- 
formation theoretic criterion because M p j x grows alge- 
braically, and not logarithmically, with the complexity of the 
reconstructed particle. 



Since the number of measured diffraction patterns will be 
very large, and the number of pairs to be cross correlated 
grows as its square, the execution of this approach also seems 
prohibitive. As Bortel et al. ^ have shown, however, this 
estimate is overly pessimistic since by selecting suitable rep- 
resentatives of the orientational classes the number of cross 
correlation computations can be drastically reduced. The ex- 
pectation maximization (EM) algorithm described below is an 
alternative classification method where the most time consum- 
ing step is again the computation of very many cross correla- 
tions. But unlike methods where both vectors of the cross 
correlation are data and criterion (5) applies, in the EM algo- 
rithm only one of the vectors is data while the other is derived 
from a model. This has the added bonus that the time of the 
EM calculation is linear, rather than quadratic, in the number 
of measurements. 



C. Classification by expectation maximization 

The algorithm we have developed for the single particle 
imaging experiments and studied previously in the context 
of noise limits [2] is based on the idea of expectation max- 
imization (EM) [4J] . In general, EM seeks to reconstruct a 
model from statistical data that is incomplete. The model in 
the present setting is the intensity signal W, the data are the 
sets of photon counts K recorded by the detector, and the lat- 
ter are incomplete because the orientation ft, of the particle 
relative to the detector, is not measured. 

The EM algorithm is an update rule on the model, W — > 
W', based on maximizing a log-likelihood function Q(W'). 
The algorithm derives its name from the fact that Q(W) is ac- 
tually an expectation value of log-likelihood functions, where 
a probability distribution based on the current model parame- 
ters W is applied to the missing data f2. We will derive Q(W) 
for the single particle imaging problem in stages, beginning 
with the log-likelihood function for the photon counts at a sin- 
gle detector pixel. 

Let W(q) be the time-integrated scattered intensity at spa- 
tial frequency q when the particle is in some reference ori- 
entation. The detector pixels, labeled by the index i, approxi- 
mately measure M p - lx point samples W(q.i). When multiplied 
by the pixel area and divided by the photon energy, W(qi) 
corresponds to the average photon number recorded at pixel 
i. Since these normalization factors are constants, we will re- 
fer to W interchangeably as "intensity" or "average photon 
number." If we now give the particle some arbitrary orienta- 
tion fi, the average photon number at pixel i is iy(Rn • qi), 
where Rq is the orthogonal matrix corresponding to the ro- 
tation between the reference orientation and f2. Because the 
implementation of the algorithm approximates the continuous 
il with a discrete sampling of M Iot points labeled by the in- 
dex j, we define Wij = W^Rj ■ qi) as the average photon 
number at detector pixel i when the particle has orientation j. 

The log-likelihood function for the mean photon number 
W[j, given a photon count at pixel i in measurement k, is 
the logarithm of the Poisson distribution (apart from an irrel- 
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evant constant): 

Qi jk (W) 



(6) 



Summing this function over the detector pixels gives the log- 
likelihood function associated with the joint and independent 
Poisson distributions on the photons detected in a single mea- 
surement (labeled by fc): 



Mr, 



(7) 



If we knew the orientation j that applied to the counts K ik of 
measurement k, we would try to maximize the corresponding 
Qjk with respect to the model values W(j. The EM algo- 
rithm deals with the missing information by making an ed- 
ucated estimate of j, for each measurement k, based on the 
current model values. However, before we enter into these 
details we should point out that the EM algorithm in our for- 
mulation works with many more model parameters than there 
are in the physical model. That is because Wij and W^ji are 
treated as independent parameters even in the event that the 
corresponding spatial frequencies Rj • and Hj> ■ are 
nearly the same. This overspecification of parameters will be 
rectified by the "compression step" described below. 

The EM algorithm defines the log-likelihood function 
Q(W) on the updated model parameters W by assigning 
a provisional distribution of orientations j to each measure- 
ment k based on the current model parameters W. The j- 
distribution is given as the normalized likelihood function for 
the measurements K.-,k conditional on j and the model param- 
eters W. Up to an irrelevant j-independent factor, the condi- 
tional probability in question is the product of Poisson proba- 
bilities at each detector pixel: 



Rjk{W)= J] W*«*exp(-Wy). 



(8) 



The normalized likelihood function allows for an arbitrary 
prior distribution of the orientations j which we denote by 
the normalized weights Wj : 



Pjk(W) 



WjR 3 k(W) 



(9) 



This form is necessary even when the prior distribution on 
orientations fl is uniform (the usual assumption for the single 
particle experiments) because in general the discrete samples 
j cannot be chosen in such a way that the weights Wj are uni- 
form. The EM log-likelihood function may now be written 
explicitly: 

M data M rot 

Q(W) = E p jk(W)Q jk (W'). (10) 

k=l 3=1 

Details on the maximization of Q(W) are given in ap- 
pendix|Bj the resulting "maximizing" (M) update rule is sim- 
ple and intuitive: 



M 



w'- 
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Ek=rPMw)K lk 



(id 



We see that the data Kn~ are averaged over all the data sets 
(k index) with the unknown orientation index j distributed ac- 
cording to probabilities Pjk (W) defined by the current model. 
It is instructive to check that the update rule applied to an ar- 
bitrary rotation of the true signal leaves the signal unchanged 
(details in appendixlBli. 

We now return to the point that the parameters Wij over- 
specify the true model parameters. For fixed j, the M p j x 
numbers Wij correspond to a tomographic sampling of the 
3D space of intensities on a spherical surface with orientation 
in the 3D space specified by j. To recover a signal in the 3D 
space we define a "condensation/compression" (C) mapping 



C : Wi.. 



W(p), 



(12) 



where p denotes a spatial frequency sampling point in the 3D 
intensity space. Since the samples p will be arranged on a 
regular 3D grid, we define interpolation weights /(q) for a 
general point q in the 3D space which vanish for large |q| and 
have the property 



l=£/(p-q) 



(13) 



for arbitrary q. Recalling that the value Wij corresponds to 
the 3D sampling point Hj • q^, the signal values after the com- 
pression mapping are given by 



W(p) = 



Z^i=l 



E£?/(p- 



Rj • qi)Wi. 



i=l 2^1 = 1 J \V 



R 7 - 



(14) 



To begin another round of the EM algorithm, after the con- 
densation step, the signal values on the 3D grid have to be 
"exported/expanded" (E) to the tomographic representation: 



E 



W(p) - W^ 



(15) 



Using the same interpolation weights and rotation samples j 
as before, we have 



(16) 



W[a then have the ef- 



The combined mappings EC: Wij 

feet of imposing on the redundant tomographic representation 
of the signal the property that it is derived from values on a 3D 
intensity grid. A slightly different way of grouping the three 
mappings defines one iteration of what we will call the EMC 
algorithm (Expansion followed by expectation Maximization 
followed by Compression): 



C-M-E: W(p) -*• W'(p). 



(17) 



The most time consuming step of the EMC algorithm is 
the computation of the probabilities Pjk(W). Prior to nor- 
malization these are the likelihood functions Rj).(W) whose 
logarithms are given by 



log (R jk (W)) = >K ik logW, 



Wi„ 



(18) 



i=i 
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At the heart of the algorithm we have to compute the cross 
correlation (sum on i) between the photon counts in each 
measurement k and the logarithm of the signal at each tomo- 
graphic sampling (particle orientation) j. Since data are not 
cross correlated with data, as in some classification methods, 
the time scaling is linear in the number of measurements. Af- 
ter normalizing to get Pjk (W), the mutual information needed 
for the noise criterion (|4|i is obtained without significant addi- 
tional effort (details in appendix[B]i: 



i(K,n)\ 



1 



w 



dat; 



k=i j=i v j / 

(19) 

The expectation maximization technique described above 
is very similar to that used by Scheres et al. [6] for cryo-EM 
reconstructions. Cryo-EM and single-particle x-ray imaging 
differ in two important physical respects. The first is that the 
diffraction data in the x-ray experiments has a known origin 
(zero frequency), thereby reducing the missing information. 
This is not completely an advantage because the diffraction 
data, after a successful reconstruction, must undergo an addi- 
tional stage of phase retrieval before the results can be com- 
pared with cryo-EM. The second difference is the noise model 
that applies to the two techniques. In the absence of back- 
ground, the shot noise in the x-ray experiments is a funda- 
mental and parameter-free process, whereas the background 
ice scattering in cryo-EM requires phenomenological models. 
The expectation maximization algorithm is general enough 
that these differences do not change the overall structure of 
the reconstruction process. In fact, the work of Scheres et al. 
|0] points out that the algorithm is readily adapted to include 
additional missing data, such as conformational variants of the 
molecule. 

Redundant representations of the model parameters, and 
operations that impose consistency with a physical (3D) 
model, are also shared features. Scheres et al. B6Q obtain the 
3D model using ART [7], a least-squares projection technique. 
The corresponding operation in our reconstruction algorithm 
is the linear compression-expansion mapping E • C. The re- 
dundancy question did not arise in the same way for the min- 
imal model studied previously [2], with only a single rotation 
axis. There the intensity tomographs did not intersect and the 
speckle structure had to be imposed by an additional support 
constraint on the Fourier transform of the intensity distribu- 
tion. 

Finally, we note that Fung et al. |8|] have developed a tech- 
nique that, like our expectation maximization approach, uses 
the entire body of data in a single update of the model param- 
eters. 



III. TEST PARTICLES 

When simulating the single-particle experiments it is im- 
portant to distinguish between the resolution limit imposed 
by the maximum measured scattering angle and the resolution 
intrinsic to the scattering particle as a result of its dynamics. 



The scattering cross section for a complex molecule is gen- 
erally significantly smaller, at large momentum transfer, than 
what is predicted by atomic form factors and a static molecu- 
lar structure. This phenomenon is well known in crystallogra- 
phy, where the coherent illumination of numerous molecules 
in various states of perturbation is equivalent — when consid- 
ering the information recorded in Bragg peaks — to a single 
molecule with blurred contrast. The same effect, but in the 
temporal domain, will diminish the scattering at large angles 
in the single-particle experiments. 

The dynamics of molecules subject to intense x-ray pulses 
is complicated by the presence of several physical processes 
09J]. In the case where the degree of ionization of the atoms is 
relatively low during the passage of the pulse, the x-ray scat- 
tering is dominated by the atomic cores and can be analyzed 
by modeling the atomic motions. Let A-^(t) be the amplitude 
of the incident radiation in a particular momentum mode k. 
The cross section for scattering a photon into mode k + q 
with frequency uj contains as a factor the expression 



r(q) = 



dtA^t)^ 



p 



/ p (q)exp [iq- r p (t)} 



(20) 



where / p (q) is the atomic form factor of atom p whose po- 
sition r p (t) changes with time as a result of large scale ion- 
ization, etc. We are interested in modeling the q-dependence 
of the molecular form factor (l20b . Without access to detailed 
simulations of the Coulomb explosion, we have adopted for 
our data modeling the simple one-parameter form, 



r(q)/r(0)=exp(-B|q| 2 )S(q), 



(21) 



where 5(q) is the normalized static (t = 0) structure factor 
of the molecule. This Gaussian form results if the dynam- 
ics of the positions r p (t) can be approximated by independent 
Gaussian fluctuations over the coherent time scale T of the 
pulse. The period T, or the time during which the function 
Ak(t)e luJt is approximately constant, is significantly shorter 
than the pulse duration in a non-seeded free electron laser 
id. 

We expect more detailed dynamical simulations of the scat- 
tering cross section to show significant departures from the 
form above, of a simple Gaussian factor modulating a static 
structure factor. Rather, the effective structure factor of an ex- 
ploding molecule should resemble that of an atomistic density 
that is primarily blurred radially, with contrast at the surface of 
the molecule experiencing the greatest degradation [9]. Given 
such complications, our test particle modeling ignores atom- 
icity and treats the particle more simply as a distribution of 
positive contrast on a specified support with a phenomenolog- 
ically defined intrinsic resolution given by the form ( f2Tb . 



A. Binary contrast particles 

It is a great convenience, when developing algorithms, to 
have a simply defined ensemble of problem instances that of- 
fers direct control over the key parameters. We have chosen to 
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Output: particle contrast on cubic grid C 

C <— RandomContrast 

for i <— 1 to 4 do 

C <— BinaryContrast (C) 
C <- LowPassFilter (C) 
end 

return C 



Algorithm 1: test particle construction 



FIG. 2: Examples of the random binary contrast particles used in our 
simulations. The contrast is nearly uniform inside a labyrinthine re- 
gion that fills half the volume of a sphere. Shown are particles of 
radius R — 4,8, 12 (left to right), where the dimensionless R is in 
units of the intrinsic resolution, i.e. the scale of the voxels shown in 
cross-section for each particle in the lower panel. The particle render- 
ings in this paper, such as those above, show the iso-contrast surface 
appropriate for the labyrinth walls (half the maximum contrast). 



work with an ensemble having a single parameter that controls 
the complexity of the particle, where our measure of complex- 
ity is the dimensionless radius R which specifies the physical 
particle radius in units of the intrinsic resolution. Our particles 
have the following properties: 

(1) Spherical shape, (2) binary contrast, and (3) Gaus- 
sian form factor. 

Property (1) is chosen to make the reconstructions as hard 
as possible, both for the assembly of the 3D intensity and 
later in the phase retrieval stage (the spherical support offer- 
ing the fewest constraints [11]). We chose (2) to mimic a large 
biomolecule that because of damage can only be resolved into 
solvent (empty) and non-solvent regions. This property also 
has the convenience that most of the information about the 
structure is conveyed by rendering a single 3D contour at an 
intermediate contrast. Property (3) defines the intrinsic reso- 
lution. 

The construction of a test particle begins by choosing a 
value for the dimensionless radius R; the final contrast val- 
ues will be defined on a cubic grid of size 2R + 1. Our test 
particle construction algorithm is diagramed in pseudocode 
(algorithms Q] and |2]i . It uses the particle support S (voxels 
inside the sphere of radius R) and a Gaussian low-pass filter 
F(q) to impose the form factor (1211 on the Fourier transform 
of the contrast. To keep the dynamic range of the intensity 
measurements in our simulations constant for different parti- 
cle sizes R, we parameterize the filter as 



F(q)=exp [-1.5(|q|A? max ) 2 ], 

where g max = ir/R. Since only scattering with |q| < q n 
measured, the discarded power is always 



(22) 



is 



J 1 °°exp(-3g 2 )g 2 rfg 
/ °° exp (-3q 2 ) q 2 dq 



= 11%. 



(23) 



Input: arbitrary contrast on C, support S 
Output: BinaryContrast(C) 

foreachr £ S do C*[r] <- 

v <- MedianValue(C[r G S]) 

foreach r £ S do 
if C[r] < v then 
I C[t] <- 



else 

I C[r] 
end 
end 

return C 



In Figure 2 we show examples of binary contrast test par- 
ticles constructed for three values of the dimensionless radius 



Algorithm 2: binary contrast projection 



R. The largest particles considered in this study had R = 8 
because the reconstruction computations grow, in both time 
and memory, very rapidly with R. In Section 7 we discuss the 
scaling of the computations with R. 



B. Degraded resolution biomolecules 

To put our dimensionless radius R in perspective, the 
same low-pass filter used for test particles was applied to the 
roughly 0.8 MDa biomolecule GroEL 111 211 . After binning the 
coordinates of the non-hydrogen atoms of the PDB structure 
on a cubic grid of resolution 2A, the discrete Fourier trans- 
form was computed and truncated at the size 2R + 1. The 
result was then multiplied by the filter d22l) and inverse trans- 
formed to give the contrast used in the simulation. 

Figure 3 shows GroEL processed in this way for three val- 
ues of R. Handedness in the protein secondary structure be- 
gins to appear at about R = 6. These degraded resolution 
models of GroEL, that mimic the effects of dynamics and a 
finite duration pulse, are of course completely phenomenolog- 
ical. It may not even be true that the diffraction signal can be 
modeled by an appropriately blurred contrast function. This 
will be the case, for example, if the damage dynamics strongly 
varies with the orientation of the particle. Finally, we cannot 
ignore the fact that, as a result of thermal motion and solvent, 
at some level of resolution even the model of a unique (t = 0) 
diffraction signal breaks down. 
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FIG. 3: Contrast of the protein complex GroEL |12] degraded by 
the same type of low-pass filter used in the construction of random 
test particles (Fig. The axial length of GroEL is approximately 
15 nm. At high damage, R — 4 (left), the contrast reveals only 
the gross particle shape ("cornEL"). Handedness of the protein sec- 
ondary structure appears at about R = 6 (middle) and is fully evident 
by R = 8 (right). 



IV. EXPERIMENTAL PARAMETERS 

A. Detector parameters 

The detector geometry, pixel dimensions, and position rel- 
ative to the scattering particle determine the spatial frequency 
samples q; of the experimental data. Our simulations use a 
detector model with three parameters: oversampling factor a, 
maximum scattering angle 9, and a dimensionless central data 
cutoff a. 

The oversampling factor has the most direct interpretation 
in real space. Oversampling a corresponds to embedding the 
particle, with contrast defined on a grid of size 2R + 1, on 
a grid magnified in size by the factor a. This also defines 
the dimensions of the 3D intensity grid, on which most of the 
computations of the EMC algorithm take place. Since Fourier 
transforms play no role in these computations there is no in- 
centive to make the dimensions of the intensity grid a product 
of small primes. It is more natural in the EMC calculations, 
which have rotational symmetry about zero frequency, to have 
intensity grids of odd dimension with indices that run between 
-q max and +q max . Here q max is given by a R rounded up to 
the nearest integer. Speckles in the intensity distribution will 
have a linear size a in grid units. 

A real detector does not measures point samples with re- 
spect to spatial frequency but convolves the true intensity sig- 
nal with a point spread function defined by the pixel response 
lfl3ll . To minimize this effect the oversampling in experiments 
should be kept large. Another reason for keeping a large is al- 
gorithmic: the expansion and compression steps of the EMC 
algorithm, which interpolate between tomographic and grid 
samples, introduce errors that are also minimized when the 
oversampling is large. In this study we used a = 6. 

The maximum scattering angle is determined by the radius 
of the detector, L, and the distance D between the particle and 
the detector, by tan 6 = L /D. We define L to be the radius of 
the largest disk that fits inside the actual detector. This corre- 
sponds to discarding data recorded in pixels outside the disk, 
in the corners of the detector. With this minor truncation of 
the data, all of it can be embedded in the 3D intensity grid for 



any particle orientation (relative to the reference orientation). 

The actual choice of spatial frequencies used by the 
EMC algorithm is largely arbitrary, and so we start by con- 
sidering detector pixels at arbitrary positions [xi, j/j] in the de- 
tector plane. Up to a constant factor, the photon momentum 
detected at pixel i is 



(24) 



and the corresponding spatial frequency, or momentum trans- 
fer from the incident beam, is q^ = p; — po, where po is given 
by (l24l with Xi = iji = 0. Intensities at these spatial frequen- 
cies will be represented as interpolated values with respect to 
the 3D intensity grid. Since the latter has unit grid spacing, we 
choose an appropriate rescaling of the q^ that is well matched 
with this. Because most detectors will have pixel positions 
on a square lattice with some pixel spacing d, our simulations 
are based on this model. We note, however, that the EMC 
algorithm operates with general tables of frequencies q i5 and 
whether these are derived from a standard detector or a more 
complex tiled design is invisible to the workings of the algo- 
rithm. For the square-array detector n — rrii d, — rii d, 
where rrii and m are integers, and the rescaled spatial fre- 
quencies are given by 



q» = 



D/d] 



- [0,0, D/d] 



(25) 



These samples lie on the surface of a sphere that passes 
through the origin and the scaling is such that samples near 
the origin match the 3D integer grid: 



qi « [to,-, n.i, 0]. 
The pixels at the edge of the detector have 



*? = L/d, 



(26) 



(27) 



and should correspond to frequencies at the highest resolution 
shell, or |qj| = q max . This condition, evaluated for d25l ) with 
D = L cot 6, reduces to 



Qmax = (L/d) cos 6 sec (9/2). 



(28) 



For a small maximum scattering angle this reduces to the 
equality between the pixel size of the detector, 2(L/d), and 
the number of samples in one dimension of the intensity grid, 
2(Zmax- The ^-dependence of expression ((28) is a result of the 
spherical shape of the Ewald sphere. 

The forward scattering by the uniform or uninteresting part 
of the charge density of a compact particle is so much more 
intense than the scattering at larger angles from the non- 
uniform, interesting part, that most detectors need to have the 
central pixels blocked (beyond what is needed to avoid the in- 
cident beam). Figure [4] shows a simulated intensity scan pass- 
ing through the origin, for one of the test particles described in 
Section[ni] The huge central speckle contains essentially only 
information about the total charge and almost no information 
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FIG. 4: Radial intensity scan for an R = 8 test particle on a linear 
scale. Our simulations only use data collected outside the central 
speckle, g/g max > a/R = 0.18. 



about the structure of the particle. A natural size for the detec- 
tor block is such that scattering at frequencies inside the main 
speckle is discarded. We obtain the cutoff frequency q m - ln by 
evaluating the scattering amplitude of a uniform ball having 
the same radius R as the test particle. The first vanishing of 
this amplitude determines q m [ n : 

(-R/gmaxkmin = OL (29) 

where a s» 1.43 is the first non-zero root of -kx — tan-Trx. 
Since q max = u R, more generally we define 

9mm = era, (30) 

which shows that the low frequency cutoff is a times the 
speckle size in grid units (a). 

We conclude this section by reviewing the procedure for 
generating the spatial frequencies q; used by the EMC algo- 
rithm. Prior to this a dimensionless test particle radius R has 
been selected. The half-size of the intensity grid is then given 
by q max = cr R, and our simulations used a — 6. Given the 
maximum scattering angle 6 we then determine the detector 
radius in pixel units, L/d, from d28l ). as well as the detector- 
particle distance D jd using D = L cot 6. All our simulations 
used 6 = 45°. Having determined L/d, we determine (for our 
choice of square array detector) the indices mj, i%i satisfying 
in 2 + n 2 < (L/d) 2 . These are used in formula (f25t to give 
the table of frequency samples in the reference orientation of 
the particle. Finally, to model the discarded central data we 
remove from the table all samples with |q,-| < q m i u , where 

?min = 1.43(7. 



B. Diffracted signal strength 

A key experimental parameter is the flux of photons inci- 
dent on the particle. For the purpose of simulating the recon- 
struction process, however, a more convenient form of this 
parameter is the average number of photons scattered to the 
detector in one measurement, N. This normalization of the 



diffraction signal can be carried out once the detector's spatial 
frequency samples are determined. 

In order to generate diffraction data with the property that 
the mean photon number is N, we first compute the squared 
magnitude of the Fourier transform of the particle contrast 
embedded on the intensity grid (having size 2g max ). We in- 
terpret the numbers on this grid as the photon flux scattered 
into the respective spatial frequencies, at this point with ar- 
bitrary global normalization. A detector pixel at one such 
frequency sample will record an integer photon count drawn 
from the Poisson distribution having the (time and pixel area- 
integrated) flux as mean. The quantity we wish to normalize 
is the net flux at all the detector pixels. When this quantity is 
N, then the mean photon number per measurement will also 
be N. 

Because the particle contrast is not spherically symmetric, 
the net diffracted flux to the detector pixels will fluctuate with 
the particle orientation. We avoid bias arising from this ef- 
fect by sampling a few hundred orientations of the particle 
and applying the associated rotations to the frequency sam- 
ples in order to estimate the orientation-averaged flux. This 
number is then used to rescale the flux values on our 3D inten- 
sity grid. With this normalization in place, we generate data 
by repeatedly sampling random orientations, rotating the fre- 
quency samples, and then drawing Poisson samples at each of 
the rotated frequencies for the means given on the normalized 
intensity grid. Linear interpolation of the grid intensities is 
used to obtain the diffracted fluxes at the rotated frequencies. 

V. RECONSTRUCTION PARAMETERS 

In addition to the diffraction data, prior information about 
the particle provides additional parameters to the reconstruc- 
tion algorithm. The only such information we consider in 
our simulations is the dimensionless particle radius R. This 
parameter is used in both the intensity reconstruction by the 
EMC algorithm, as well as the phase retrieval stage that re- 
constructs the particle contrast from the intensity. 

A. Rotation group sampling 

The EMC algorithm orients 2D data tomographs within the 
3D intensity distribution using a discrete sampling of the rota- 
tion group. An optimal sampling is one where the samples are 
uniformly distributed and at a sufficient density to resolve the 
smallest angular features. Because speckles in the intensity 
distribution have linear dimension a, features of this size (in 
voxel units) at the highest resolution shell, q max , determine 
the angular scale: 

59 = a/q max = l/R. (31) 

The rotation group parameterization that is best suited for 
generating uniform samplings is based on quaternions lfl4ll . 
Unit quaternions are points on the unit sphere in four dimen- 
sions and encode 2-to- 1 the elements of the continuous rota- 
tion group in three dimensions. Their key property is the fact 
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FIG. 5: Sampling the 3D rotation group is equivalent to sampling 
the surface of a sphere in 4D. Shown in the top row is a scheme 
for sampling the surface of a sphere in one lower dimension based 
on subdivisions of the 20 faces of the icosahedron. The analogous 
construction in one higher dimension subdivides the 600 tetrahedral 
faces of the 600-cell, two examples of which are shown in the bottom 
row. The resolution of the sampling is specified by the number of 
subdivisions of each edge; shown are n = 2 (left) and n = 3 (right). 



When combined with the estimate OTb . this implies that n 
should roughly coincide with the dimensionless particle ra- 
dius R. Moreover, since n sw R, the number of samples (see 
appendixlcb, 

M rot (n) = 10(57i 3 + n), (34) 
grows in proportion with the volume of the particle. 

B. Particle support 

Our phase reconstruction of the complex diffraction ampli- 
tude is carried out with the diffraction magnitude on the same 
grid as used by the EMC algorithm for the intensity recon- 
struction. The support constraint is therefore that the particle 
contrast can be non-zero only within a sphere of radius R grid 
units. In our simulations, which were limited to R < 8, we 
increased the support radius by one or two units because pre- 
cise knowledge of the support is usually not available in real 
experiments. 

VI. RECONSTRUCTION ALGORITHM 



that the distance between quaternions q and q' , in the usual 
sense, is simply related to the angle of the relative rotation be- 
tween the group elements associated with q and q' . For small 
relative rotations 59 this relationship is: 

\\q-q'\\^ 66/2. (32) 

Given a 59, the problem of selecting rotation group samples, 
such that any rotation is within a relative rotation 59 of some 
sample, is thus equivalent to the standard problem of con- 
structing efficient coverings 11511 of the 3-sphere. We solve 
this covering problem by using a design based on a highly 
symmetric poly tope, the 600-cell lfl6Tl . This polytope is the 
four dimensional analog of the icosahedron in that it approx- 
imates the curved surface of the sphere by a union of regular 
simplices — 3D tetrahedra rather than triangles in four di- 
mensions. The regular tetrahedron is efficiently covered by 
points in the fee arrangement; coverings with increasing res- 
olution are shown in Fig. [5] The resolution of the covering is 
parametrized by an integer n that gives the number of subdivi- 
sions of each edge of the tetrahedron. Our 3-sphere coverings 
are obtained by rescaling the points that cover the tetrahedral 
faces of the 600-cell to unit length. Details of the construction, 
including the computation of the sample weights, are given in 
appendix[C] 

There are only two properties of the rotation group sam- 
pling that have direct relevance to the EMC algorithm for in- 
tensity reconstruction: the angular resolution 59 and the num- 
ber of rotation samples, M rot . Defining 69 as the covering 
radius of the sampling, the n-dependence is given by (see ap- 
pendix[C]>: 

59(n) « 0.944/n. (33) 



Our algorithm for reconstructing the scattering contrast of 
a particle begins by reconstructing the 3D intensity with the 
EMC algorithm for classifying diffraction data. This section 
describes in concise algorithmic language the EMC process 
already sketched in Section lHCl For the relatively much eas- 
ier final step, of reconstructing the particle contrast from the 
intensity, we use the difference-map (DM) phase reconstruc- 
tion algorithm. A short description of the DM algorithm, de- 
scribed in greater detail elsewhere [17], is included for com- 
pleteness. 

A. EMC intensity reconstruction 

The EMC algorithm builds a model of the 3D intensity from 
a large collection of non-oriented, shot-noise limited diffrac- 
tion data. The orientational classification of the data is prob- 
abilistic, where the data are assigned probability distributions 
in the rotation group and these are systematically refined so as 
to maximize the likelihood of the intensity model. The EMC 
algorithm comprises three steps: 

E-step: Expand the grid intensities into the tomographic rep- 
resentation: W[q\ — > Wij. 

M-step: Update the tomographic intensities by expectation 
maximization: Wij — ► W[j. 

C-step: Compress the tomographic model back into a grid 
model: W'[q]. 

We use pseudocode to describe these steps in the next sec- 
tions. The notation matches the theoretical discussion in Sec- 
tion lHCI Spatial frequencies are denoted by q and p, detector 
pixel indices by i, rotation group samples by j, and k is always 
a data index. 
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Input: grid model W[q], reference tomograph spatial 

frequencies q^, rotation matrices Rj 
Output: tomographic model Wij 

for j <— 1 to A/ r ot do 

for i <— 1 to A/pix do 
q' <— • qi 

Wij <- Interpolate(W[q],q') 
end 

end 

return 

Algorithm 3: E-step: model expansion 



Input: tomographic model Wij, data Kit, rotation group 
weights Wj 

Output: updated model WL, mutual information / 

I <- 

for j <— 1 to Mrot do 
for £ <— 1 to A/pi x do 

end 
end 

for k <— 1 to Mdata do 

for j <— 1 to Af rot do 

Pjfc <- ConclProb(W / i J ,iCifc) 

for i <— 1 to Mpi x do 
| w£ «- W£ + /',/ /\',, 
end 

+ -Pjfc 

/^/ + P jfe -Log(P^M) 
end 

end 

for j <— 1 to Af r ot do 
for i *— 1 to A/pix do 

end 
end 

/ <- //A/ data 
return VK/j, / 

Algorithm 4: M-step: expectation maximization 



7. E-step: model expansion 

In the E-step the grid model of the intensities is expanded 
into a redundant tomographic representation (model) to make 
the expectation maximization step (M) easier. Intensities Wij 
in the tomographic model are treated as independent variables 
by the M-step. 

Each element of the tomographic model is associated with 
a particular detector spatial frequency q, and rotation matrix 
R 3 . The A/pi x frequencies refer to the detector (or par- 
ticle) in an arbitrary reference orientation; the M rot matrices 
Rj are rotations relative to the reference orientation. The con- 
struction of the qi for a simple square array detector is given 
in Section [IV] Our rotation matrices are generated from pre- 
computed lists of quaternions that sample the rotation group 
at the desired resolution (see appendix [Cj. We use linear in- 
terpolation to extract intensity values at the rotated spatial fre- 
quencies q' = Rj • q^. 



2. M-step: expectation maximization 

The probabilistic classification of the data, and then their 
aggregation into an improved tomographic model, is per- 
formed in the M-step. Central in this process is the compu- 
tation of the conditional probabilities Pjk . These are based on 
the current intensity model, Wij . When the diffraction data 
(photon counts) are averaged with respect to these proba- 
bilities (equation (fTTTi), the result is a tomographic model W[j 
with increased likelihood. 

The most time-intensive parts of the computation, and in- 
deed of the whole reconstruction algorithm, are the nested 
loops over k, j, and i that would imply an operation count 
that scales as Mdata x M Iot X M p - lx . However, the innermost 
loop, on the pixel index i, can be greatly streamlined in both 
places where it occurs by skipping all the pixels that have zero 
photons. We use a sparse representation of the photon counts 
that reduces the time scaling to Mdata x M rot x N, since most 
non-zero counts will be a single photon and the average total 
photon number is N. 

Two copies of the intensity model are held in memory at 
any time: the current model for conditional probabilities, and 
the updated model obtained by photon averaging. In the inner- 



most loop computations of the conditional probabilities only 
the logarithm of the current model is used. The actual compu- 
tation in this most time-intensive step involves incrementing 
the conditional probabilities Pjk by log Wij for the pixels i 
that recorded photons in diffraction pattern k (or a multiple of 
this if multiple photons were recorded). After the conditional 
probability for a particular k is computed, the second time- 
intensive step is executed. In this the updated model W[j is 
incremented by Pjk, again, for only the pixels i where pho- 
tons were recorded (or a multiple of this). 

The pseudocode shows how directly the mutual informa- 
tion I(K, Sl)\w is computed from the conditional probabili- 
ties Pjk- In Section lVIIBI we show how this quantity provides 
a useful diagnostic for reconstructions in addition to having 
intrinsic value as a measure of information. There are a few 
places not shown in the pseudocode that require special at- 
tention in the implementation. As an example, it is important 
to check for over/underflow in the computation of the condi- 
tional probabilities when the logarithms of the not yet normal- 
ized probabilities are exponentiated. 
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Input: data index k, tomographic model Wij, data Kik, 

rotation group weights Wj 
Output: Pj k = CondProh(W lj , K ik ) 

S <- 

for j <— 1 to A/ r ot do 
Pj k <- Log(iUj) 

for i <— 1 to Mpj x do 
end 

P 3h <- Exp(P jfc ) 
S* <— S + Pjfc 
end 

for j <— 1 to A/ r ot do 
| P, fe «- Pik/S 
end 

return P,*. 

Algorithm 5: conditional probability 



Input: tomographic model W(j, reference tomograph spatial 

frequencies q; , rotation matrices Rj 
Output: grid model W'[q] 

foreach q do 

W'[q\ <- 
S[q] «- 
end 

for j <— 1 to A/ r ot do 

for i <— 1 to A/pi x do 
q' <— Rj q» 

G <- GriclNeighbors(q') 

foreach p e G do 

/ <— InterpolationWeight(q' — p) 
W'[p]^W'[p} + fW^ 
S[p] «- 5[p] + / 
end 
end 

end 

foreach q do 

I W'\q\ <- W[q]/S[q] 
end 

W'[q] <- FrieclelSym(VK'[q]) 
return W'[q] 

Algorithm 6: C-step: model compression 



3. C-step: model compression 

The C-step ([Pil l is the reverse of the model expansion, or E- 
step, and both of these operations use far less time than the M- 
step that comes in between. Over the course of multiple EMC 
iterations, the combination of C-step followed by E-step has 
the effect of making the tomographic model of the intensity, 
W[j, consistent with a 3D model W[q] defined on a grid. We 
use linear interpolation (as in the E-step) when collapsing the 
tomographically sampled intensities onto the grid. 

Because of noise, the averaging of the data Kik m me M- 
step produces a tomographic model that does not respect the 
Friedel symmetry W[q] = W[— q] when compressed to the 
grid model. This symmetry is restored at the conclusion of the 
C-step by replacing W'[q\ and W'[— q] with their average. 



B. Phase retrieval 

We use the difference-map algorithm ifnll to reconstruct the 
phases associated with the Fourier magnitudes obtained by the 
EMC algorithm, as well as the Fourier magnitudes in the cen- 
tral missing data region (q < q m i n ). 

Our pseudocode for the difference-map emphasizes its 
generic character as a method for reconstructing models sub- 
ject to two constraints. One constraint, provided by the 
EMC intensities, is the Fourier magnitude constraint. The 
difference map implements this constraint by the operation 
FourierProj. FourierProj takes an arbitrary, real- 
valued input contrast C and projects to another contrast F 
called the Fourier estimate. The action of FourierProj 
is most transparent on the Fourier transforms of the input and 
output contrasts. In the data region q m [ n < q < q mELX , the 
output F inherits the Fourier phases of the input C with the 
Fourier magnitudes provided by the square roots of the EMC 
intensities. In the central missing data region, q < q m i n , both 
the Fourier phase and magnitude of the input are preserved in 



the output. Finally, for spatial frequencies above the data cut- 
off, q > tj'max) me Fourier amplitudes of F are identically set 
to zero. 

The other difference map constraint is implemented by the 
operation SupportPro j. When acting on an arbitrary real- 
valued input contrast C, SupportPro j outputs the support 
estimate S. The output S is obtained by zeroing the contrast in 
C outside the support of the particle and all the negative con- 
trast within the support. Since the phase reconstructions are 
performed on exactly the same size grids as the EMC intensity 
reconstruction, the radius of the spherical support region im- 
plemented by SupportPro j is the same dimensionless ra- 
dius R (increased by a few grid units) that defines our binary 
contrast test particles and degraded resolution biomolecules 
(SectionlVBl). 



1. Difference map phase reconstruction 

The difference-map reconstruction begins with a randomly 
generated initial real-valued contrast X and is otherwise 
completely deterministic. As X is updated by the opera- 
tions FourierProj and SupportProj, the correspond- 
ing Fourier and support estimates, F and S, are generated. In 
reconstruction problems that reach fixed points X * , where the 
magnitude e of the update AX vanishes, either F or S can 
be output as the solution since they are the same when e van- 
ishes. This is not the case even for our phase reconstructions 
with simulated data. There are multiple sources of error that 
make it impossible for both constraints to be satisfied simul- 
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Input: constraint projections FourierProj and 

SupportPro j 
Output: real-valued particle contrast C, error series E 

C <- 
M <- 

X <— RanclomRealContrast 

for i <— 1 to IterationCount do 

5* <— SupportPro j (X) 

F <- FourierProj (2 S - X) 

e^||AX|| 

£■ <— Append(-E, e) 

if e < ErrorMax then 

C ^C + F 
M <- M + 1 

end 
end 

C <- C/M 
return C, E 

Algorithm 7: difference-map algorithm for phase 
reconstruction 



taneously. The intensity truncation for q > q max , for exam- 
ple, introduces a small inconsistency even when the diffrac- 
tion data are oriented perfectly by the EMC algorithm. Of the 
two alternatives to choose for the reconstructed contrast, we 
use the Fourier estimate F for reasons that will be clear be- 
low, when we discuss the modulation transfer function (MTF) 

The behavior of the difference-map error metric e = \\F — 
S\\ typically has two regimes in phase reconstructions. In the 
early iterations e decreases nearly monotonically, thereby im- 
proving the consistency between the Fourier and support es- 
timates. In the second regime e is relatively constant, with 
small amplitude fluctuations suggestive of a steady-state. Be- 
cause e = || AX 1 1 also measures the magnitude of the update, 
the iterated contrast X and the estimates F and S are also 
fluctuating in this regime. To produce reproducible results, 
we average the Fourier estimate F in the steady-state and call 
this the result of the phase reconstruction. 

Since our spherical support is consistent with either enan- 
tiomer of the particle (and the intensity data does not distin- 
guish these either), a successful reconstruction will be the in- 
version of the true particle in about half of all attempts that 
start with a different initial X. In those cases we invert the 
reconstruction before making comparisons. 



2. Modulation transfer function 

The Fourier estimates F have, by construction, always the 
same Fourier magnitudes in the data region g m ; n < q < <7, nax 
(provided by the EMC intensities). This means that the fluctu- 



ations of F at these spatial frequencies are purely the result of 
phase fluctuations. By averaging the difference-map estimates 
F (after the steady-state is established), we are performing the 
average 

MTF(q) = (exp (i<f>q)), (35) 

where q is the Fourier phase at spatial frequency q. Phases 
that are reconstructed well and fluctuate weakly give MTF sa 
1, while strongly fluctuating phases lead to a small MTF. 
Since the degree of fluctuation is correlated with the mag- 
nitude of q, we additionally perform a spherical average of 
MTF(q) to define a modulation transfer function that con- 
cisely conveys the quality of the phase reconstruction as a 
function of the resolution q = |q|. 

VII. SIMULATIONS 

This section explores the conditions necessary to recon- 
struct particles in numerical simulations. We are primarily 
interested in understanding the behavior of the reconstruction 
algorithm as a function of the dimensionless particle radius 
R. For any R, the feasibility and quality of reconstructions 
depends critically on three additional parameters: 

1 . Does the average number of photons per measurement, 
N, satisfy the criterion (O on the reduced information 
rate? 

2. Do the M ro t discrete samples of the rotation group pro- 
vide a sufficient approximation of the continuous group 
for particles of the given complexity? 

3. Are the total number of measurements Mdata sufficient 
to reconstruct the particle with acceptable signal-to- 
noise? 

Although the parameters N and A/data are determined by the 
experiment while A/ rot is algorithmic, this distinction is ar- 
tificial when we recognize that both the physical and com- 
putational components of the imaging process are subject to 
limited resources. We have studied the effects of these pa- 
rameters systematically by reconstructing the binary contrast 
test particles described in Section iHll These particles resem- 
ble biomolecules at a resolution above the atomic scale and 
can be generated for any R. Our simulations culminate in a 
desktop computer reconstruction of the GroEL protein com- 
plex rSectionlllTt at a resolution corresponding to R = 8. 

A. Data generation 

All our simulations begin with the construction of the con- 
trast of a particle at a specified dimensionless radius R (Sec- 
tionllllli. After embedding the contrast on a grid with the cho- 
sen oversampling (usually a = 6), the squared Fourier magni- 
tudes are computed as a model of the intensities. All particles 
in our simulations thus have only a single discernable struc- 
ture/conformation at the measured resolution. 
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Simulated data — tables of photon counts — are generated 
by repeatedly Poisson-sampling the intensity model at a set of 
spatial frequency samples specified by our detector parame- 
ters. In each simulated measurement all the spatial frequen- 
cies are rotated by a random element of the 3D rotation group. 
We generate uniform rotation group samples by uniformly 
sampling points on the surface of the unit sphere in four di- 
mensions (quaternions) and mapping these to orthogonal ma- 
trices (equation ( ICU ). The rotation element used to produce 
each measurement is not recorded. Because the data is simu- 
lated with uniform rotation group samples, the group weights 
used by the reconstruction algorithm are the uniform sampling 
weights ( IC13b . Since the rotated spatial frequency samples 
will fall between the grid points of the intensity model, inter- 
polation is used to define the mean of each Poisson-sampled 
photon count. Finally, by normalizing the intensity model as 
described in Section lTV Bl our data have the property that the 
mean photon number per measurement is N. 

The number of data used in the reconstructions (A/data) can 
be very large, sometimes exceeding 10 6 when R is large and 
TV is small. By using sparse records of the photon counts (Sec- 
tion lVIII At , however, the total storage required for the data is 
not much greater than the storage needed for a single 3D in- 
tensity model having the corresponding signal-to-noise. The 
reconstruction algorithm has, of course, no access to the 3D 
intensity model that was used to generate the data. 



B. Convergence with rotation group sampling 

We argued in Section IV A| that an adequate sampling of 
the rotation group, for reconstructing particles of dimension- 
less radius R, is obtained when the rotation group sampling 
parameter n (edge-subdivisions of the 600-cell) matches this 
value. Whereas the proportionality n oc R is clear, we present 
here some additional assessments that support the simple rule 
n as R. While larger n are even better, the n 3 growth in the 
memory used by the algorithm motivates us to identify the 
smallest n that achieves good results. 

The most direct test is to perform a single iteration of the 
EMC algorithm, beginning with the true intensity model. For 
this we generated data with sufficient total recorded photons 
(N x A/data) that signal-to-noise is not a factor. Since the 
data are generated by the same intensity model that begins the 
EMC update, the only thing that can spoil the preservation of 
the intensity by the update is the insufficient sampling of the 
rotation group. In Fig. [6] we show planar slices of the inten- 
sity model after one EMC update for a particle with R = 8. 
The extreme case n = 1, with only Af rot = 60 samples, 
is clearly inadequate because large regions of the intensity 
grid are never visited by a rotation of one of the detector's 
spatial frequency samples. This shortcoming is eliminated 
at about n = 4 (M ro t = 3240), however, the intensity in 
the highest resolution shell lacks the expected speckle struc- 
ture. These features first become established at level n = 8 
(Af rot = 25680). 

There is another assessment of the rotation group sampling 
that does not require the true intensity model (or converged 
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FIG. 6: (Color online) Convergence with respect to rotation group 
sampling. The plot shows the increase in the mutual information 
I(K,Q,)\w as the discrete sampling of the rotation group is in- 
creased; the integer n is inversely proportional to the angular res- 
olution. Saturation of the mutual information with n indicates the 
data K have exhausted the available orientational information in the 
intensity W, here for the case of an 7? = 8 particle. The insets show 
corresponding cross sections of the intensity W' generated by a sin- 
gle EMC update starting from the true intensity W. Speckles in the 
highest resolution shell appear at about n = 8. The intensity scale is 
logarithmic and missing data regions are rendered black. 



reconstruction). In this test we ask to what extent the data 
can detect additional rotational structure just by increasing the 
sampling parameter n. The measure of rotational structure 
most relevant to the available data is the mutual information 
I(K, £Y)\w- For a given intensity model W, this gives the 
information a typical measurement K provides about its lo- 
cation in the rotation group. Clearly this depends on W — 
possibly a poor approximation of the true intensity — as well 
as the noise in the data (mean photon number N). In any case, 
if the value of I(K, Q)\w is significantly increased upon in- 
creasing n, then there is information in the data that can de- 
tect additional rotational structure that should be exploited. 
In our implementation of the EMC algorithm I(K, fi)\w is 
calculated with negligible overhead in every iteration (see al- 
gorithm 4). To test convergence with respect to n we simply 
increase n and observe how much I(K, fl)\w increases. In 
Fig. |6]we also show the behavior of I(K, £l)\w as a function 
of n for the same particle and data used to create the intensity 
cross sections. The leveling off at n w 8 is consistent with our 
earlier observations. 



C. Feasibility with respect to mean photon number 

The total number of photons recorded in an imaging experi- 
ment is N x Afdata- If the particle orientations were known for 
each of the diffraction data, then the quality of reconstructions 
would be independent of how the photon budget is allocated: 
simple signal averaging will give the same result if half the 
number of photons (N/2) are recorded on twice the number 
of data sets (2A/data)- This changes when the orientations are 
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FIG. 7: (Color online) Numerically computed reduced information 
rate r(N) as a function of mean photon number N, for particle sizes 
R = 4, 6, 8 (red, green and blue respectively). The interpolating 
curve shows mutual information averaged over 1 1 particles — rep- 
resented by the scattered points — at each R for various N. The 
vertical lines intersect the curves at their respective r(N) = 1/2 
points. 



FIG. 8: Update magnitudes AW for the intensity reconstruction of 
an 7? = 8 particle at four values of the reduced information rate: 
r(25) = 0.42, r(45) = 0.55, r(80) = 0.72 and r(225) = 0.90. 
The normally rapid convergence (AW — > 0) of the EMC algorithm 
becomes protracted as r(N) approaches the value 1/2. The corre- 
sponding particle reconstructions are shown in Fig. [9] 



unknown, and we rely on the reduced information rate r(N) 
for guidance ("Section !!! Al l. 

We computed r(N) using equation (HJl by numerically eval- 
uating I(K, fl)\w for binary contrast test particles. The strict 
definition of r(N) calls for an average over an intensity en- 
semble W; in our case this corresponds to particles of a par- 
ticular radius R. Figure|7]shows plots of r(N) as a function of 
the mean photon number TV for R = 4, 6, and 8. As shown by 
the small scatter in the results for the larger N, fluctuations 
of I(K,Q)\w within each radius ensemble are very small, 
thus establishing r(N) as a useful statistic when given only 
the particle radius. The single most important conclusion to 
draw from the r(N) plots is that the reduced information rate 
is negligibly reduced (from unity) for even relatively small 
N. Taking r(N) > 1/2 as the feasibility criterion, we obtain 
mean photon number thresholds of N = 27.5, 33.5, and 36.9 
for the three sizes of particles. From studies of the ID minimal 
model [2], where this behavior can be analyzed in greater de- 
tail, we expect the threshold N values to grow logarithmically 
with R. 

The consequences of r(N) being below the feasibility 
threshold are noticed in two ways. First, there is a marked 
change in the behavior of the EMC intensity reconstruction 
algorithm, the progress of which we monitor by the time se- 
ries of the update magnitudes: 



AW 2 = (\W'(p)-W(p)\ 2 



(36) 



Here W' is the update of W and the average is uniform over 
all spatial frequencies between q m i n and q max . Figure [8] con- 
trasts AW time series for four EMC intensity reconstructions 
of the same R = 8 particle, the data differing only in the 
value of N with all other parameters, including the total pho- 
ton number, identical. The rapid decrease to zero, seen in the 
reconstructions with r(80) = 0.72 and r(225) = 0.90, is typi- 
cal when r(N) > 1/2. Likewise, the strongly non-monotonic 
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FIG. 9: Reconstructions of an R — 8 test particle at three values of 
the reduced information rate r(N), all other parameters unchanged. 
Shown in the top panel, from left to right, are r(25) = 0.42, 
r(45) = 0.55 and r(80) = 0.72; the true particle is reproduced 
on the right. The bottom panel shows the corresponding modulation 
transfer function (MTF) computed by the phasing algorithm. Behav- 
ior of the EMC algorithm for these reconstructions is shown in Fig. 
i 



behavior that stretches out over many iterations is normal for 
data that according to our r(N) criterion is below the feasibil- 
ity threshold. At the first broad minimum of AW in the plot 
for the case r(45) = 0.55 the reconstructed intensity is nearly 
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spherically symmetrical (a powder pattern). It takes the EMC 
algorithm many iterations to develop speckle structure by the 
gradual amplification of small features. 

A second manifestation of being below our feasibility crite- 
rion is a loss of resolution in the final particle reconstruction. 
This is demonstrated in Fig. |9l which shows the results of 
three of the reconstructions described above. The degradation 
of high spatial frequency detail in the reconstructions at small 
r(N) is a direct consequence of the reduced information rate 
in these data. With less total information available to recon- 
struct an accurate intensity, the resulting phase reconstruction 
of the particle is compromised. 



D. Reconstruction noise and number of measurements 

Even when the orientations of the diffraction patterns are 
known, shot noise in the intensity measurements will limit the 
signal-to-noise of the reconstructed particle. The resolution 
of the reconstruction will be compromised if the signal-to- 
noise at the highest spatial frequencies is poor. Because the 
intensities within a speckle are correlated, a natural quantity 
to consider is the total number of photons recorded in a typ- 
ical, high spatial frequency speckle. If we denote this num- 
ber by /i, the associated shot noise magnitude is JJi, and the 
signal-to-noise in the highest frequency features of the recon- 
struction is y/Jb. The scaling of fj, with the experimental pa- 
rameters is obtained by dividing the total number of recorded 
photons, N x Mdata, by the number of speckles. Since the 
latter scales as R 3 , the bound on the signal-to-noise given by 
oriented diffraction patterns and perfect phase reconstruction 
scales as 



SNRoc 



NM da 



R 3 



NM da 



5'. 



(37) 



where the second proportionality follows from ( f34b and de- 
fines a convenient dimensionless measure of the signal. We 
see that S 2 is simply the average number of photons assigned, 
in a classification scheme, to each tomograph in a tomo- 
graphic representation of the 3D intensity when the rotation 
group is sampled at the appropriate resolution. 

When the diffraction data are not oriented, the reduction in 
the information rate, as measured by r(N), leads to a loss in 
signal-to-noise. For small data sets (which may not be suffi- 
cient to reconstruct the particle) this effect is modeled by re- 
placing Afdata with r(A^)Mdata in d37l >. However, as argued 
in Section [VII CI for even modest N, r(N) is close to unity 
and loss of signal-to-noise by this mechanism is minor. 

Provided the photon numbers are reasonable, say r(N) > 
1/2, reconstructions from non-oriented diffraction data will 
fail for the same reason they fail for oriented data: the signal- 
to-noise is simply too small. This is shown in Fig. QJJJ 
where three R = 8 test particles are reconstructed at the 
same r(N) = 0.75 and decreasing values of S (decreasing 
Miata)- We see that below S rs 30 the resolution of the re- 
constructed particle is far less than the intrinsic resolution of 
the R = 8 particle used to generate the data. The EMC algo- 
rithm succeeds at reconstructing a low resolution model of the 
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FIG. 10: The same particle reconstructed with the same average 
number of photons per diffraction pattern, N = 100, and increasing 
data. From left to right in the top panel are shown reconstructions 
with signal-to-noise parameter S = 10, 30, 50; the true particle is 
on the right. The panel below shows the corresponding modulation 
transfer function (MTF) computed by the phasing algorithm. 



particle at the smaller S because the speckles at small spatial 
frequency may have sufficient numbers of photons when the 
speckles at high frequencies do not. We have defined S so that 
the same standard, say S « 30, is meaningful for particles of 
arbitrary size. 



E. A biomolecule at 2 nm resolution 

We prepared an R = 8 GroEL particle (1 nm half-period 
resolution) and simulated diffraction data from its Fourier in- 
tensities as described in Sections [III] and IVII Al respectively. 
The mean photon number was set at N = 100. For parti- 
cles of size R = 8 this implies a reduced information rate 
r(N) = 0.75 (Fig. |7). The EMC intensity reconstruction al- 
gorithm was run with rotation group sampling up to n = 8 
(M rot = 25680) and up to one million diffraction patterns 
(Afdata = 10 6 ). All other parameters used in these simula- 
tions are listed in Sections iHll IIVI and fVl 

As discussed above (Section IVII Dl i. the EMC algorithm 
automatically reconstructs a lower resolution model when the 
number of data is such that only the speckles at low spatial 
frequency have adequate signal-to-noise. Because such lower 
resolution models have coarser angular features, a lower res- 
olution sampling of the rotation group can be used to recon- 
struct them. These observations suggest a simple protocol for 
accelerating the solution process: first obtain a low resolu- 
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FIG. 11: Mutual information I(K, and EMC update magni- 

tude AW as a function of iteration and increasing rotational sam- 
pling n in the intensity reconstruction of the GroEL particle. The 
initial n is small to save time and increased when the mutual infor- 
mation saturates and AW vanishes. With each increase of n the 
number of data were also increased: 2 x 10 5 , 5 x 10 5 , and 10 6 . 
The intensity model after the 17th iteration was used in the phase 
reconstruction of the particle shown in Fig. [T3] 



FIG. 12: (Color online) Log-intensity cross sections after the n = 
5 (left) and n — 8 (right) stages of the intensity reconstruction of 
the GroEL particle. The colors red (high) to yellow (low) span 4 
orders of magnitude. The low frequency speckles are already fully 
reconstructed at rotational sampling n = 5. 



tion model and then refine this by increasing M rot and Mdata 
until the conditions for reconstructing up to the intrinsic res- 
olution are reached. This strategy takes advantage of the fact 
that in practice very few EMC iterations are needed for the fi- 
nal, time-intensive, refinements. Time and memory scaling of 
the algorithm with M rot and Mdata are discussed in Section 

En] 

We began the intensity reconstruction of the GroEL particle 
using n = 5 and A/data = 2 x 10 5 (S — 54). Most of the time 
saving is the result of the reduced rotational sampling, which 
scales as n 3 . A single EMC iteration on a 3 GHz machine at 
these parameter values takes 40 minutes. The reconstruction 
of low frequency speckles can be monitored by visually in- 
specting planar cross sections of the intensity model. A more 
quantitative approach makes use of the mutual information 
I(K,Sl)\w, where W denotes the current intensity model. 





FIG. 13: The R = 8 GroEL particle (top left) compared with the 
results of our reconstruction (top right). The reconstruction was ro- 
tated to bring the two particles into alignment. The resolution of the 
reconstruction is degraded by about a factor of two relative to the 
model used to generate the data. Cross sections of the contrast are 
compared in the bottom row (left: model, right: reconstruction). 



When this quantity saturates, the EMC algorithm is unable 
to improve the orientational accommodation of the data when 
evaluated at the limited rotational sampling. Stagnation of the 
EMC algorithm (AW — 0) also implies the likelihood func- 
tion, of W given the data, cannot be improved with the cur- 
rent settings. To improve both the orientational assignments 
of the data and the likelihood function, the rotational sam- 
pling and the number of data must be increased. Figure [TTJ 
shows the effects of gradually increasing n and Mdata on the 
value of I(K, ft)\w- The refinements with n = 7 and n — 8 
take very few iterations to reach the next (higher) plateau. We 
used the vanishing of AW as the stopping criterion for all the 
EMC stages, including the refinements. One EMC iteration 
in the final refinement stage (n = 8, S = 62) took about 24 
hours. Figure Q~2] compares intensity cross sections after the 
early (n — 5) and late (n = 8) stages of rotational refinement. 

The GroEL particle was reconstructed from the final EMC 
intensity model using the phase retrieval algorithm described 
in Section IVIBI The only other input to this algorithm is 
the support of the particle contrast and the constraint that the 
contrast is non-negative. Altogether there are four sources of 
error that can degrade the quality of the reconstructed parti- 
cle. Three of these are responsible for errors in the inten- 
sity model: finite sampling of the rotation group, finite data 
sets, and grid interpolation errors (finite oversampling). The 
fourth error source is the truncation of the data at spatial fre- 
quencies outside the range g m ; n to g max - The missing infor- 
mation for q > q max has the greater effect, since the non- 
negativity constraint is very effective at reconstructing the 
missing beam-stop (q < q m i n ) intensities when this region 
includes few speckles. Because the weak intensities beyond 
<Zmax are not reconstructed — the algorithm treats them as 
zero — a small incompatibility is introduced in the constraints 
used by the difference-map phasing algorithm. This together 
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FIG. 14: Difference map error metric e in the phase retrieval of the 
GroEL intensity model obtained by the EMC algorithm. A steady 
state of residual phase fluctuations is reached after about 50 itera- 
tions. Averaging the phase reconstruction over the subsequent 200 
iterations produced the particle contrast shown in Fig. [T3]and the 
MTF function in Fig. Q3] 
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FIG. 15: Phase retrieval MTF function for the GroEL particle recon- 
struction. 



with the other sources of error lead to the nonvanishing of 
the difference-map error metric e shown in Fig. [14] and av- 
eraging with respect to residual phase fluctuations is required 
to arrive at a reproducible result. The phase retrieval MTF 
function (Section [VIB 2b computed during the averaging pe- 
riod provides a comprehensive assessment of the internal self- 
consistency of the entire reconstruction process. 

The reconstructed 0.8 MDa GroEL particle is compared in 
Fig. Qj] with the R = 8 resolution model used to generate 
the diffraction data. There is clearly a loss in resolution as a 
result of all the factors described above. From the phase re- 
trieval MTF function, shown in Fig. Q3] we see that contrast 
begins to deteriorate beginning with spatial frequencies about 
half the maximum of those measured (1 nm). Since the MTF 
begins to decline at about 0.5q max , the reconstructed resolu- 
tion is conservatively half the half-period resolution, or 2 nm. 



VIII. COMPUTATIONAL REQUIREMENTS 

Our simulations show that particles can be reconstructed at 
low resolution, R < 8, and modest computational resources 
even when as few as N = 100 photons are recorded on the 
average diffraction pattern. Because the parameters R and N 
are dictated by the physical properties — including damage 
mechanisms — of the sample and available light source and 
are therefore least under the control of the imaging experi- 
ment, it makes sense to assess the feasibility of real recon- 
structions as a function of these parameters. We will see that 
the computational resources are essentially independent of N 
and scale as simple powers of R. This analysis assumes a 
fixed oversampling a and fixed signal-to-noise in the recon- 
structed contrast. 



A. Memory scaling 

The data storage demands are modest and minor relative to 
the memory used by the algorithm when photon counts are 
recorded in a sparse format. A sparse-encoded measurement 
comprises on the order of N integers identifying the detector 
pixels that have non-zero counts. For the pixels with single 
counts the pixel index provides a complete record; the small 
minority of pixels with multiple counts require an additional 
integer. Sparse data storage therefore scales in proportion 
to the total number of measured photons, Mdata x N. At 
fixed signal-to-noise the number of photons aggregated per 
grid point in the intensity reconstruction is fixed. Since the 
number of points in the intensity grid scales as R 3 , the total 
number of detected photons — and the sparse storage space 
— also scales as R 3 . In a parallel implementation of the re- 
construction algorithm it makes sense for all the data to be 
resident in each processor. 

The largest set of variables used by the algorithm are the 
current and updated tomographic representations of the inten- 
sity models, Wij and WL . These have size M p ; x x M rot and 
scale as R 2 x R 3 = R 5 . The actual memory used depends 
strongly also on er, which we assume is kept fixed as R is var- 
ied. Our largest simulations (R = 8, er = 6) used about 1 Gb 
with the memory dominated by these arrays. 

B. Time scaling 

The least certain part of our analysis is estimating the num- 
ber of iterations needed by the EMC algorithm. The sim- 
ulations described in Section IVII CI are consistent with the 
hypothesis that the number of iterations is fixed and small 
provided N exceeds a modest information theoretic thresh- 
old (see Fig. [8). Because this minimum N (see Fig. 13 is 
believed to grow only logarithmically with R, we assume the 
criterion can always be satisfied and thus the number of EMC 
iterations is practically independent of R. 

The most time-intensive operation in each iteration of the 
EMC algorithm is the expectation maximization step (M) 
where the photon counts in each measurement, K^, are 
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cross-correlated with the model log-intensities in each rota- 
tional sample, log Wij . The number of operations scales as 
N x A/data x M TO t after the sum over the i-index, using the 
sparse data representation, is reduced to a sum of N terms. As 
argued above, N x Mdata scales as R 3 and M rot as R 3 , giv- 
ing a time per iteration that scales as R 6 . This represents the 
time scaling of the reconstruction, since the number of itera- 
tions is independent of R and the other two steps of the EMC 
update (E and C) are much faster because they do not involve 
the data. 



C. Parallel implementation 

Since both the memory and time scaling is dominated by 
operations on the tomographic intensity models Wij and W[a , 
in a parallel implementation these and the cross-correlations 
on them would be distributed among the processors so as to 
minimize message passing. A natural approach is to have a 
separate master node perform the E step in such a way that 
blocks of the tomographic models (ranges of the j-index) are 
sent to different processors. After cross-correlating with all 
the data, which requires no message passing, each processor 
will send its results back to the master node for aggregation 
in the C step. In this scheme both memory per processor 
and time of the reconstruction are reduced in proportion to 
the number of processors. 



IX. CONCLUSIONS 

The aim of this study was to provide a detailed assessment 
of the feasibility and quality of reconstructions for the pro- 
posed single-particle imaging experiments by testing the per- 
formance of a particular algorithm developed for this purpose. 
The many dozens of reconstructions required to map out the 
parameters space could not have been carried out had the op- 
eration of the algorithm not become a fairly routine process. 
We never encountered a situation where the intensity recon- 
struction (EMC algorithm) had to be abandoned or restarted, 
or where the subsequent phase reconstruction did not repro- 
duce the true particle to the expected resolution. Our attention 
to experimental details {e.g. missing central data) in the simu- 
lations gives us confidence that the algorithm developed here 
will also succeed with real data. 

A variant of our algorithm was previously studied in the 
context of a minimal model having a single rotation angle as 
missing data [HI. In the introduction we argued that in some 
respects the minimal model reconstructions might be harder 
than the reconstructions in a realistic model, where the diffrac- 
tion data span the entire 3D rotation group. This scenario has 
been confirmed by our simulations. Recall that in the min- 
imal model a 2D intensity distribution is sampled by non- 
intersecting ID diffraction patterns, while in single-particle 
imaging the intensity is 3D and the 2D diffraction patterns 
intersect pairwise along arcs. In the minimal model the to- 
mographic representation of the intensity is non-redundant, 
while in the 3D problem the tomographs are highly redundant 



and mutual consistency has to be imposed with the E and C 
steps of the EMC algorithm. The structure of the intersecting 
diffraction patterns and redundant variables in the 3D problem 
provide a mechanism, absent in the minimal model, that ac- 
celerates the reconstruction. The redundancy is greatest near 
the origin, where many diffraction patterns pass through the 
same speckles. Because the signal-to-noise is also greatest 
in this region, the reconstruction can begin there in a consis- 
tent fashion and then progress to higher resolution shells. By 
contrast, the rotational classification of diffraction patterns in 
the minimal model is not incremental and requires many more 
model-update iterations. 

Reconstruction algorithms for the first round of experi- 
ments will have to address two additional complications not 
considered in our simulations. The first is the large shot-to- 
shot fluctuation in the incident photon flux of the source when 
the FEL process is unseeded [10]. This adds another miss- 
ing datum to the three orientational parameters, per diffrac- 
tion pattern, that the expectation maximization step of the al- 
gorithm will have to reconstruct. The second complication is 
the background of photon counts arising from non-target par- 
ticles, upstream beam optics and inelastic scattering. To deal 
with this the reconstruction algorithm should make use of the 
averaged dark count (no target particle). This represents ad- 
ditional data and requires a modification of the conditional 
probability computations. 
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APPENDIX A: MUTUAL INFORMATION IDENTITIES 

Given a trio of random variables K, 51 and W, we can eval- 
uate the mutual information between one of them, say K, and 
the other pair, (Q, W), treated as a single random variable. 
Writing the mutual information in terms of the entropy func- 
tion H, we have 

I(K,(Sl,W)) = H(K)-H(K)\ {n , w) (Al) 

- H{K)-H(K)\n + H(K)\n-H{K)\ {n ,w) 

= I(K,Q)+I(K,W)\ n (A2) 

Interchanging and W in this derivation gives the identity 

I(K,(n,W))=I(K,W)+I(K,n)\ w . (A3) 

Combining the two identities above we obtain the general re- 
sult 

J{K,Q)+I(K,W)\a =I(K,W)+I(K,Sl)\ w . (A4) 

For our specific choice of random variables the mutual infor- 
mation I(K, 17) vanishes identically because a measurement 
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K confers no information about the orientation ft since the 
ensemble of models W itself has an orientational degree of 
freedom that is uniformly distributed. Our identity thus in- 
volves only three terms: 



I(K, W)\ a =I(K,W)+I(K,Q)\ 



(A5) 



APPENDIX B: EXPECTATION MAXIMIZATION DETAILS 

1. Likelihood maximization 

Rearranging the order of the sums in the definition of the 
log-likelihood function we obtain 

Mpix M rot 

W) = E E lo g w 'n - B i w '^ < B1 > 
i=i j=i 



where 



Aij = Pjk{W)K lk 

k=l 

M data 

Bj = E p ^ w )- 



(B2) 



k=l 



Each term of the sum ( IBU is of the form a log W — bW where 
a and b are positive constants. Since the terms are indepen- 
dent, the global maximum is achieved when each term is max- 
imized with the value W = a/b. 



2. Fixed-point rotational invariance 

We wish to show that the intensities of the true model, W, 
or more generally any rotation of this model, W R , is a fixed 
point of the maximization update rule 111 It , Given the true 
model we can write down the probability distribution of the 
photon counts K: 



M Tot ^ 



(B4) 



where the joint Poisson distribution 

Rj{W, K)=l[-^ r exp (-Wy) (B5) 



is the same (up to an irrelevant factor) as ((HJ but the data index 
k has been replaced by the function argument K representing 
an arbitrary vector of photon counts. Because the probability 
distribution on K is unchanged if the model is rotated, we 
have 

M Iat M rot 

Y, ™i *i K)=Y «>i Rj (W, K), (B6) 



for arbitrary rotations R. The distribution (IB4t and the invari- 
ance dB6t are approximations that become exact in the limit 

Mrot — > CO. 

Taking the numerator of the maximization update rule ( fTTT i 
and evaluating it for the rotated model W R with the data sum 
replaced by a sampling of P(K), we have 



]T P ]k (W R )K lk 



(B7) 



fc=i 



Aw^n/of pM^l ) Ki . 

Substituting ( IB4I ) for P(K) and using ( lB61 l. this reduces to 

A/data 

E Pjk(W n )K ik 

M data J2 w i R 3 (^ R > K ) K i (B8) 



k=l 



K 



M data WjW; 



R 

3" ij 



(B9) 



by the property that the mean of Ki for the Poisson distribu- 
tion is Wtf. By the same steps, the denominator of the update 



(B3) rule gives 



Mi 



Pjk(W R )^M dataWj , 



k=l 



thus showing the desired fixed point property 
M : W R -> W R . 



(BIO) 



(Bll) 



3. Mutual information formula 

To evaluate the mutual information I{K,Q)\w using the 
quantities used by the EMC algorithm we need to approximate 
the integral over orientations by a sum over the discrete 
samples j and expectation values over photon counts Ki by a 
normalized sum over the counts Ki k in the actual data (k is 
the data index). 

Suppressing the model variables W, which we treat as a 
fixed quantity whenever I(K, Vt)\w is calculated by the EMC 
algorithm, the mutual information is given by 

W") = I E P(K)P(m)^g^^-. (B12) 

Replacing the il-integral by a weighted sum over samples j 
and the X-expectation by a sum over the data, we obtain 



M rot 
\ " 



1 



■ , 1 -Mdata , 
3 = 1 k=l 



Mdata 



u.,P(^|^ fe ) 



(B13) 



We recover formula ( fT9l with the identification 

to J -P(n j -|Jf*)=P iJk (W). (B14) 
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APPENDIX C: ROTATION GROUP SAMPLING BASED ON 
THE 600-CELL 

The quaternion parameterization of 3 x 3 orthogonal matri- 
ces is given by the formula 



(1 — 2gf — 2q\ 2q x q 2 + 2q Q q 3 2qiq 3 - 2q a q 2 
2q 2qi - 2q q 3 \-2q\- 2q\ 2q 2 q 3 + 2q qi 
2q 3 q x + 2q q 2 2q 3 q 2 - 2g <7i 1 - 2qf - 2q\ 

(CI) 

where the unit quaternions q = (qa,qi,q 2 ,q 3 ) are points on 
the unit 3-sphere. Because of the property H(q) = R(— q), 
the unit quaternions are mapped 2-to-l to the elements of the 
3D rotation group. The multiplication rule for quaternions is 
most transparent in the 2x2 spin- 1/2 representation 



q = q +i(T -q, 



(C2) 



(<r are the Pauli matrices) which defines the "scalar" (qo) and 
"vector" (q) parts of the 4-vector. The scalar part encodes 
just the rotation angle, 6, while the vector part also carries 
information about the rotation axis, n: 



qo = cos (6</2) q = sin(0/2)n. 



(C3) 



From (IC21 > and properties of the Pauli matrices one can verify 
that the inverse of a rotation is obtained by reversing the sign 
of the vector part. This fact is consistent with equations (ICU 
and (E3). 

Since the rotation required to move the element q to the el- 
ement q' is the quaternion q'q^ 1 , a group-invariant distance 
between these elements should only be a function of the rota- 
tion angle (conjugacy class) of q'q^ 1 , that is, its scalar part: 
Qoq'o + q ' Q ■ Since the latter equals 



q . q ' = l-\\ q -q'\\ 2 /2, 



(C4) 



the standard Euclidean distance on the quaternion 3-sphere is 
a group invariant distance between rotation group elements. 
This last remark is key for generating efficient samplings of 
the 3D rotation group: the unit quaternions should be placed 
to efficiently cover the 3-sphere 11511 . A cover is optimal if it 
has the minimum number of points with the property that an 
arbitrary point of the space is always within a given covering 
radius r c of some point in the cover. In our case the covering 
radius corresponds to a rotation angle 86, such that an arbi- 
trary rotation group element is always within a rotation by 86 
from one of our samples. If q is a point of the cover and q' an 
arbitrary point, then 

rl > h-q'\\ 2 
= 2-2q-q' 

= 2-2(,' g - 1 ) 
= 2-2cos((9/2) 
« (6/2)\ 

The covering radius, of the 3-sphere by quaternions, and the 
angular resolution 86 of the rotation group sampling are there- 
fore related by: 



Our covers of the 3-sphere are based on the highly sym- 
metric 4-dimensional polytope that has the greatest number of 
regular tetrahedra — 600 — as its 3D facets. Known as the 
{3, 3, 5} polytope, or 600-cell, it is the 4D analog of the regu- 
lar icosahedron 1160 . The tetrahedral facets of the 600-cell are 
well covered by points arranged in the fee lattice. We will use 
the integer n to describe the degree of the refinement of each 
tetrahedron by points of the fee lattice. For example, when 
n = 4, each tetrahedron edge is divided into 4 equal segments 
thus introducing n — 1 = 3 edge points. This value of n also 
introduces (n — l)(n — 2)/2 = 3 points on each tetrahedron 
face and (n — 1 ) (n — 2) (n — 3)/6 = 1 points in the interior of 
the tetrahedron. The {3, 3, 5} polytope has 120 vertices, 720 
edges, 1200 faces, and 600 cells. Combining this information 
with the point counts of the tetrahedron refinements, we ob- 
tain the following formula for the number of sample points on 
the 3-sphere: 



2M mt {n) 
= 120 + 720(ra- 1) 



1200 



(n-l)(n-2) 



+600 



(n- l)(n-2)(n-3) 



(> 



20(n + 5n 3 



The factor of 2 on the left side takes into account the over- 
counting by ± quaternion pairs. 

To determine the appropriate level of refinement n, we need 
to compute the corresponding angular resolution 89(n). Con- 
sider one tetrahedral cell of {3, 3, 5}; in canonical coordinates 
lfl6ll the vertices are 



^3 
' ! 4 



(1,0,0,0) 
i(r,l,l/T,0) 

*M/t,0,1) 
i(t,0,1,1/t), 



where r = (1 + y/5)/2 is the golden mean. The edge length 
of this tetrahedron is 1/r; the tetrahedra of the refinement will 
have edge length l/(n r) and this is also the minimum dis- 
tance between fee lattice points and \/2 times the covering 
radius r c . When the fee lattice points are projected to the unit 
3-sphere by rescaling, the points near the center of the tetrahe- 
dron are expanded by the greatest amount. The linear expan- 
sion factor is given by the reciprocal of the distance between 
the origin and the tetrahedron center: 



4/||ui + v 2 + v 3 + Vi\\ = Vs/i 



1.080. 



(C6) 



The coarsest part of the sampling thus has minimum distance 
r c = 2/ (nr 3 ). Using dC51 l we obtain: 



(n) = 4/(nr 3 ) w 0.944/n. 



(C7) 



86/2. 



(C5) 



The samples of the 3D rotation group in this scheme carry 
non-uniform weights. Because the measure on the contin- 
uous group is just the volume element of the 3-sphere, the 
weight associated with a sample is proportional to the vol- 
ume of its associated Voronoi cell. These weights/volumes 
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are non-uniform as a result of two effects. First, there is a cor- 
rection that affects the samples at the vertices and edges of the 
600-cell, where the joining of the tetrahedral cells results in 
angular deficits. The second correction to the weights arises 
from the non-uniform distortion of the Voronoi cells, when 
these are projected from the 600-cell to the 3-sphere. 
A regular (flat) tetrahedron has dihedral angle 



_1 (l/3) « 70.5°. 



(C8) 



Because five tetrahedra meet at every edge of the 600-cell, the 
fractional volume associated with samples on edges is given 
by 



/i = 5a/(27r) w 0.979566. 



(C9) 



The spherical angle subtended at a vertex of the regular tetra- 
hedron, by spherical trigonometry, is 3a — 7r. Because 20 
tetrahedra meet at every vertex of the 600-cell, each vertex 
sample has fractional volume 



fo = 20(3a - 7r)/(47r) « 0.877398. 



(CIO) 



Since there are no deficits at samples on faces or within the 
cells of the polytope, 



h = h = i. 



(Cll) 



The volume change upon projecting a Voronoi cell from a 
{3, 3, 5} facet to the unit 3-sphere is the result of two things: 
(1) a uniform expansion by the linear scale factor l/||g||, 



where q is the (non-unit) quaternion of the sample on {3, 3, 5}, 
and (2) projection of the 3-space of the tetrahedral cell to the 
tangent space of the 3-sphere at q, the projection of q. The 
second of these produces a reduction by the factor q ■ c, where 
c is the unit outward normal vector to the facet on which q 
resides. In terms of the four cell-vertices, 



Vi + V2+V3 + V4 
\\V1 + V2 + V3 + U4|| 



(C12) 



The overall (unnormalized) weight of a sample q, originating 
from sample q on the 600-cell, is given by 



w(q) = fk 



g • c 



(C13) 



where k — 0,1,2,3 is the associated dimensionality of the 
sample (vertex, edge, etc.). Vertex samples have the lowest 
weight, samples at the cell center the highest; their ratio is 



^vertex _ JO 
^center fz 



V8 



0.644. 



(C14) 



In our pre-computed tables (for given n) of quaternion sam- 
ples qj we include the weights Wj oc w(qj) as the fifth com- 
ponent. All formulas in this paper assume the normalization 



J2 W * = L 



(C15) 
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